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Multigrid methods for the Navier-Stokes equations at low speeds and large temperature 
variations are investigated. The compressible equations with time-derivative preconditioning and 
preconditioned flux-difference splitting of the inviscid terms are used. Three implicit smoothers 
have been incorporated into a common multigrid procedure. Both full coarsening and semi- 
coarsening with directional fine-grid defect correction have been studied. The resulting methods 
have been tested on four 2D laminar problems over a range of Reynolds numbers on both uniform 
and highly stretched grids. Two of the three methods show efficient and robust performance over 
the entire range of conditions. In addition none of the methods have any difficulty with the large 
temperature variations. 
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1. INTRODUCTION 

In a now classic work, Brandt [1] defined optimal multigrid convergence as reduction in the 
discrete system error to the level of the truncation error in a computational effort that is a small 
multiple of the effort to evaluate the discrete system residual. Today this is termed textbook 
multigrid efficiency (TME). While it is well known that TME can be achieved for fully elliptic 
problems, for the equations of fluid d)mamics this performance has only been approached in a 
few simplified problems. Brandt and Yavneh [2] solved the incompressible Navier-Stokes 
equations for high-Reynolds number entering flows on a uniform rectangular grid. Thomas et al. 

[3] extended this work and applied it on a stretched rectanguleir grid to the boundary layer over a 
finite flat plate. Roberts et al. [4] solved the incompressible Euler equations on both unstructured 
triangular and structured quadrilateral grids for flows through a simple channel and over a 
symmetric airfoil. Roberts et al. [5] added nonlocal farfield boundary conditions and again 
applied it on a quadrilateral grid to the flow over a synunetric airfoil. A common feature of each 
of these works is that the fluid equations are separated into an advection part (advection- 
diffusion part for the viscous problem) and an elliptic part. The advection part is solved by 
marching in the downstream direction and the elliptic part is solved by multigrid. However, at 
this juncture it is not evident how these approaches are to be extended to general viscous flows 
with large separated regions. 

Hence much effort continues to improve more general multigrid solvers and to extend them 
to a variety of problems. We note a number of recent efforts in the following. Pierce and Giles 
[6] examined methods for improving the performance of Runge-Kutta based solvers for 
compressible turbulent flows on stretched meshes. Steelant et al. [7] did the same for a number 
of implicit solvers applied to low Mach number laminar and turbulent flows. Amaladas and 
Kamath [8] investigated the performance of several implicit time-marching methods together 
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with a number of upwind schemes for 2D inviscid and viscous compressible flows. Three papers 
have dealt with multigrid methods for the 3-D incompressible Navier-Stokes equations. Drikakis 
et al. [9] introduced a characteristics-based upwinding and used a smoother based on artificial 
compressibility (AC) and Runge-Kutta time-marching. Yuan [10] investigated three implicit 
time-marching smoothers together with AC. Montero etal. [11] evaluated both alternating-plane 
smoothers with standard coarsening and plane-implicit smoothers with semi-coarsening for the 
steady equations on staggered grids. 

We next take note of a number of efforts to extend or improve the capability of multigrid 
solvers to treat a variety of physical phenomena. Epstein et al. [12] presented a 3-D compressible 
Navier-Stokes solver that used ENO differencing with defect correction together with Runge- 
Kutta time-marching for transonic turbulent flows. Sheffer et al. [13] presented a 2-D solver for 
compressible reacting flows that used a point implicit treatment of the chemical source terms, 
upwind differencing, and explicit time-marching. Gerlinger et al. [14] addressed the same 
problem, used point implicit treatment of the source terms with central differencing and matrix 
dissipation, added a two-equation turbulence model, and employed an implicit LUSGS solver as 
a smoother. As a final example we note the use of multigrid by Caughy [15] in a temporal 
subiteration process to converge the equations at each time step in the unsteady flow past both 
fixed and moving cylinders of square cross-section at moderate Reynolds numbers. 

The present work is directed toward the development of efficient robust multigrid solvers for 
the compressible Navier-Stokes equations at low speed with large temperature variations. This is 
important for internal flows with heat transfer or combustion. To this end we employ time- 
derivative preconditioning to remove stiffness at low Mach numbers and preconditioned flux- 
difference splitting for the inviscid terms so that the implicit operator will be diagonally 
dominant. Three promising implicit smoothers have been incorporated in a common multigrid 
procedure. The resulting methods have been tested on four 2D laminar problems over a range of 
Reynolds numbers with both uniform and highly stretched grids. Convergence results from each 
of these studies are used to evaluate the relative strengths and weaknesses of the three solvers. 

2. GOVERNING EQUATIONS 


The governing equations are written in terms of nondimensional variables defined as follows 
(starred quantities are dimensional and subscript r denotes a reference value); 



where jc and y are space dimensions, t, p, u, v, T, p, p and ^are time, pressure, x and y velocities, 
temperature, density, viscosity and thermal conductivity, respectively. The primary 
nondimensional parameters are given by 
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which are, r espectively, the Mach, Reynolds, Prandtl, Froude and Eckert numbers where 

is the reference speed of sound, y is the ratio of specific heats, R is the gas constant. 
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Cp^ is the constant pressure specific heat, and g is the gravitational constant. Finally, for low 
Mach number flows, it is convenient to subtract off the large background pressure by setting 


P = P + P (1) 

with p = {yMa^y^ . 

The two-dimensional compressible Navier-Stokes equations with time-derivative 
preconditioning are written in vector, conservation-law form 

P;'—+—(E-E^)+—(F-Fy = H, (2) 

at ox ay 

where 

U ={p,pu,pv,ef , 

E = (pu, pu^ + p', puv, puh, Y , 

F = {pv, puv, pv^ -I- p, pvh, Y , 

(3) 

dx dy 

p + 

dx dy 

H = Fr-'(0,0,-p,-EcpvY, 

Q = (p',u,v,TY , 


is the preconditioning matrix, e = ph, - fc p' is the total energy, and h, =T +jEc(u^ + v^) 
is the total enthalpy. Note that the source vector H contains a gravitational body force in the 
negative y-direction and also that subtraction of the constant background pressure p does not 
change the basic form of the equations. 

The density is given by the equation of state for an ideal gas 


p^{\ + p'lp)lT, 
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and the diffusion coefficient matrices are given by 
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The preconditioning matrix is adapted from the work of Turkel [16]. First consider the 
inviscid equations written in terms of the primitive variables V = {p\u,v,S'Y , with 
dS' = dp' -c^d p and Y = Ma~^T . These are written 


where 
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For this system the simplest form of Turkel ’s preconditioner is written 
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where jc^ and 0^ is chosen to bring the system eigenvalues closer together at low 

Mach numbers. The eigenvalues of are given by 


with 
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The conservation-law form is then given by =MP^M ' with M =dU/dV . Finally, following 
Choi and Merkle [17], we express Eq.(2) in terms of the “viscous” variables Q 




( 11 ) 


where P'^ =MP-'N and N = dV/dQ. 

For most flows it is sufficient to set 0 equal to the square of the local speed together with 
lower and upper limits. However, for some low Reynolds number cases, convergence is 
improved by introducing the viscous limit of Choi and Merkle [17]. Hence we take 

0 = min[max(^o^M^ + v^y5„^),c^] , (12) 


where 0 is a user specified lower limit and 
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( 13 ) 


^ 2 _ a{a-l)u^ 

\+{a-\)u^!c^ 

Here < 3 f = RCV//?e^ , RCV is the ratio of CFL to von Neumann numbers (also user specified) 
and Re^ is the local cell Reynolds number in the jc-direction. 

3. DISCRETE FORMULATION 

The governing equations are solved on a stretched rectangular grid with unknowns stored at 
cell centers. While extension to a general curvilinear coordinate system is straightforward, the 
simpler system is sufficient for our current purposes. Integration of Eq.(l 1) over a cell together 
with Euler implicit time differencing gives 

Pg ' (dxdy/dt) AQ = , (14) 

where dx and dy are cell dimensions, dt is the time step, AQ = Q''*^ - Q " , and the area-weighted 
residual R is given by 


R = S^(E- R’^dx-'S^Q - R’^dy-'SyQ) dy 

+S^ (F - R^^dx-'S^Q - R^dy-'S^Q) dx-Hdxdy. ^ ^ 

Here and are central difference operators in x and y, respectively. 

The cell-face in viscid fluxes E and F in Eq.(15) are approximated by preconditioned flux- 
difference splitting [18]. This choice is normally dictated by the need for a more accurate 
capturing of discontinuities such as shocks or contact surfaces. In the present case, however, 
since we intend to employ implicit methods, this choice is made as a means of obtaining positive 
inviscid contributions to the global matrix to improve the efficiency of the relaxation schemes. It 
should be noted that with time-derivative preconditioning all the wave speeds are of the same 
order, see Eq.(9), and the matrix condition number is significantly reduced at low Mach 
numbers. Hence we write 

£ = i[E(t/-) + W)]-ip;'|RA|(f^"-t^")^ (16) 

where U~ and are, respectively, left and right states, = dE/dU , denotes evaluation at 
an average state, and P~'\P^AJ= MP~^\P^Ap\M ~^ . Here IRpA^I is computed using the eigenvectors 
and absolute eigenvalues of PpA^ . Eq.(16) is rewritten in terms of the “viscous” variables Q 

E = \[E{Q-) + E{,Q^)]-\A\Q^ -Q-) , (17) 

where A“ = MP~^\PpAp\N . The left and right states are given by an upwind-biased 
interpolation using van Leer’s MUSCLE approach [19] 
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(18) 


ejjj = e j ± i c'. [a T + (1 ± ] a.; . 


where j represents a cell-averaged value, S~ and are backward and forward differences in 
X, respectively, and for <7;^ =1, determines the spatial accuracy. Setting =-l corresponds 

to a fully upwind scheme, K^=+\ to a central difference scheme, and to a third order 

upwind-biased scheme. Setting cr, = 0 reduces the scheme to first order. In the present 
implementation the average state Q is taken as a simple average of Q~ and Q* and the use of 
limiters is not required. The cell-face flux F in the y direction is approximated by expressions 
similar to Eqs.(16) and (17) with replaced by = dF/dU and Eq.(18) for Qf±y^j is replaced 
by one for with interpolation in y instead of x. 

Eq.(14) is linearized about known values at level n and only first order contributions from E 
and F are retained on the left hand side 


[Pg * dxdyjdt + (S; A^) + 5^ (SJA') - 5^R^dx-^S^ } dy 

+ {S; + S; - S^R^dy-^S^ }dx-Ddxdy'\AQ = -R’' , 

where 5* and S* are half cell shift operators in ±x and ±y , respectively, and 

A* =\{A±A“),B^ =\{B±B“), 


(19) 


( 20 ) 


with A = MApN, B = MB^N and D = dH/dQ . Note that A and B are evaluated at the average 
state and that the variations of A“ and B° with Q have been neglected. 

The discrete formulation is completed by the specification of the time step dt. We set 

dxdy/dt = [r{P^A^)dy + r{P^B^)dx'\IC¥L , ( 21 ) 

where the spectral radii r are given by 

r{PpBp)=\v-u^)^ + c^^ . 


Here and are given by Eqs.(lO), while and given by the same expressions 

with M replaced by v. 


4. BOUNDARY CONDITIONS 

With the implicit relaxation methods used in this work, very simple boundary condition 
treatments have proven quite effective. Boundary values are located on the physical boundary, 
one half cell from the nearest interior point. The surface fluxes are obtained directly from Eqs.(3) 
with the normal derivatives in the viscous fluxes approximated by one-sided second-order 
differences. At an inlet «, v, and T are specified, while p' is obtained by linear extrapolation 
from the interior. At a solid wall u and v are specified, T is either specified or obtained from a 
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second order one-sided adiabatic wall condition, and p' is again found by linear extrapolation. 
Finally, at an outlet surface p' is specified and u, v, and T are obtained by linear extrapolation 
from the interior. On the left-hand side of Eq.(19), boundary conditions on AQ are treated 
implicitly. Thus, where a component of Q is specified, the corresponding component of AQ is 
set to zero. On the other hand, where an element of Q is found by linear extrapolation or 
adiabatic wall condition, that element of AQ is set equal to its value at the interior point one half 
cell away. These conditions are represented by a simple transfer relation, 

AQ,=L,AQ, (23) 

where AQ^ is the boundary value, AQ is the value at the interior point, and is a diagonal 
matrix. At an inlet or solid wall with T specified, = Diag(l, 0,0,0) ; at an adiabatic wall, 

= Diag(l, 0,0,1) ; and at an outlet, Z^ = Diag(0, 1,1,1) . 

5. SMOOTHING METHODS 


It is generally recognized that the smoothing or relaxation method is the most important part 
of a multigrid solution process. In the present work we investigate three implicit methods as part 
of a conunon multigrid process. Each of these can be found in the literature, hence only a brief 
description is given here. The descriptions are presented in terms of the global matrix form of the 
discrete system of equations. Expansion of Eq.(19) gives 


A,; + A.y ^Oi.H + ^Qi.j + A.y + A.y = - Kj > 


(24) 


where the block matrix coefficients, A to E , are given in Appendix A. When the cell (i, j) is 
located adjacent to a boundary, Eq.(23) is used to eliminate t^ boundary value, the coefficient of 
the boundary value is set to zero, and the central coefficient 5, ^ is modified accordingly. An 
example of such a modified coefficient is also given in the Appendix. 

The first smoother is Line Block Gauss-Seidel (LBGS). For sweeps in the +x direction, this 
is written as 




KAj = c,j 


B' jAOi j = -R,J - A.y Aa..,y - A.AQl-X J 

aq,j = aq:j-^^jAQ,j^, 


forj = l...n^ 
}for 7 = n^...l 


for I = l...n_ , 


(25) 


where C'j and AQjj are stored along the line and there is one LU decomposition of B'j per cell 
during the first y-sweep. For cases with a predominant flow in the x direction, LBGS is 
implemented by alternately sweeping in the +x and -x directions. For predominantly 
recirculating flows, we use a symmetric pattern of sweeps in the -f-x , +y , -y , and -x 
directions. 

The second smoother is LU Symmetric Gauss-Seidel (LUSGS). This is written 
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[ fori = l...n^,y = l...n^ , 


KjKj=E,j 


. - AAa^j-: - 


AQ,j = AQlj - C;,AQ,j^, - £',Aa,,, } for / = n, . . . 1, 7 = n, . . . 1 , 


( 26 ) 


where C'j , E'j and AQ^j are stored over the field and there is one LU decomposition of j 
per cell in the first sweep. Note that this form of LUSGS requires less global storage and one less 
LU decomposition per cell than the usual form. For cases with a predominant flow in the x 
direction, LUSGS is implemented by alternately switching the directions of the first and second 
sweeps. 

The third and final smoother is Block Incomplete LU decomposition (BILU). This is written 


Kj=Ei.j-AAj-^-D,jEUj 


B C' =C 
B' E' =E. 

uy uy ^y 


KMj = -R, j - a.Mh - J 


for/ = l...n,,y = l...M , 


(27) 


AG/.y = AG* y - Cl jAQ. j^, - E' jAQ,^, j } for i = . . . 1 , 7 = «^ . . . 1 , 


where C'j , E[^ and AG*y are stored over the field and one LU decomposition of B{^ per cell is 
performed during the first sweep. Again for cases with a predominant flow in the x direction, 
BILU is implemented by alternately switching the directions of the first and second sweeps. 

In each of the three smoothers, the interior unknowns are updated by = G"y + AG,,y only 
after AG,,y has been obtained for the entire field. Following this the boundary values of G”J' are 
computed based on the new interior values. 

6. MULTIGRID AND DEFECT CORRECTION 


Relaxation methods, such as those of the previous section, are in general much more 
efficient at reducing short-wavelength error components on a given grid than those of longer 
wavelength. Multigrid seeks to overcome this problem by transferring the long-wave 
components of the solution to a sequence of coarser grids where relaxation is more effective and 
much cheaper. Since the FAS-FMG (full approximation scheme - full multigrid) technique used 
in this work has been well documented in the literature, the present description of the multigrid 
process will be brief. The focus will, instead, be on the current implementation and in particular 
on those aspects that were found to be important in achieving a fast Navier-Stokes solver. 

Introduce a sequence of grids, ^ = m to 1 , where m is the finest and each grid k < m is 
coarser than grid k +1 in one or both directions. For k = m wc seek the solution of the nonlinear 
system 
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A^(2*) = 0, (28) 

with N representing the area- weighted residual given by Eq.(15). On any grid k let S* denote a 
single smoothing pass of the system 


iV(Q*) = F*, (29) 

here N is an approximation to N in which one or both convective operators may be replaced by 
a first-order discretization and is defined below. Next introduce a fine-to-coarse restriction 
operator for unknowns, a restriction operator /^^.j for residuals, /?* = F* - ) . ^nd a 

coarse-to-fine prolongation operator for corrections. With these definitions, the FAS 
multigrid cycle M * for improving an approximation (2* is defined recursively as follows; 

ForiS: = m, F“ N[Q^)-N{Q'^). 

For k = l , solve Eq.(29) by severd smoothing sweeps. 

For k > 1 , do these steps: 

(a) smooth on grid k, 

(b) restrict <2* to grid k-l, 

Qk-l^fk-lQk^ 

(c) restrict F* to grid ^ - 1 and form F'"“' , 

F*-* 

(d) perform / multigrid cycles on <2*~* , 

2*-' ^(m 2*-', 

(e) prolong corrections to grid k, 

< 2 * 2 ' + iL ) , 

(f) smooth on grid k, 


In general v, and Vj oan be considered functions of the grid level k. 

For y = l,a. V-cycle, V(v^,V 2 ) , is obtained. For 7 = 2 the more robust W-cycle, ^(VpV'j) , 
is obtained. We also introduce the F-cycle, which provides coarse grid performance that is 
intermediate between that of the V- and W-cycles. This is defined recursively as a ^ = 2 cycle in 
which the first iteration at grid k is an F-cycle and the second is a V-cycle. Finally, we obtain the 
full FAS-FMG technique by starting the computation on a very coarse grid, iterating to 
‘convergence’ with the FAS process, and interpolating the result to obtain initial values on the 
next finest grid. In this way the first approximation on the finest grid is already close in much of 
the domain, an important consideration in non-linear problems. Convergence criteria for each 
stage in the FAS-FMG process as used in the present work are described in the next section. 
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The sequence of grids, ^ = m to 1 , is given by either full or semi-coarsening. Li full 
coarsening, each cell on grid k is obtained by combining four cells on grid ^ + 1 such that there 
are half as many cells in each direction on grid ^ as on + 1 . This tends to work well for flows 
that are predominantly recirculating and where the grid is only moderately stretched. For flows 
with a predominant flow direction and strong grid stretching in the cross-stream direction, which 
is common in many engineering applications, a more robust albeit more expensive solver can be 
obtained by using semi-coarsening. In this case each cell on grid k is obtained by combining two 
cells in the flow direction on grid k + l such that there are half as many cells in the flow 
direction and the same number in the cross-stream direction on grid k as on k + l. Both of these 
situations will be illustrated with sample problems. 

The restriction operators, 7*^., for unknowns and 7^^, for residuals, both use full weighting. 
For 7*^, interior values are given by an area- weighted average over the fine grid cells from 
which the coarse cell is constructed. The unknown coarse-grid boundary values are then updated 
using the methods of section four. Since the residuals are conservatively differenced and area 
weighted, 7*^, is obtained by summing the fine-grid fluxes over the boundaries of the coarse grid 
cell and then adding the contributions from the fine-grid source terms. The prolongation 
operator, for corrections, is given by bilinear interpolation in the equally spaced 
computational domain. Once the interior points have been corrected, the unknown boundary 
values are again updated by the methods of section four. The same method is also used to 
interpolate converged results from a coarse grid to initialize values on the next finest grid. 

Another way to increase solver robustness is through the use of fine-grid defect correction. 
Normally this involves using first-order convective operators in both directions in JV . This 
produces a large fine-grid source term F” and substantially slows convergence. In the current 
work it has been found that using a first-order operator in the flow direction while retaining a 
higher order in the cross-stream direction gives an efficient yet quite robust solver. The source 
term F" is updated every one to four fine-grid sweeps. 

The multigrid solvers employed in this work have been coded to permit V-, W-, or F- cycles, 
including the possibility of k-dependent V', and Kj . While all the computations reported here 
were performed with W(1,0) from the fine grid, as these were the most robust for all methods on 
all problems, higher values of k, were sometimes used during the starting process on coarser 
grids. Finally, we note that the different sweeping patterns introduced in Section 5 in conjunction 
with each of the smoothers have been interleaved with the multigrid process. A sweep counter is 
established for every grid level and on each visit to that level the next direction in the sweep 
pattern for that grid is performed. This proved to be sufficient to give all the convergence 
benefits of the sweeping pattern. 

7. CONVERGENCE CRITERIA 

In establishing convergence criteria for an FAS-FMG procedure it is useful to employ a 
measure that doesn’t change by a large amount from one grid to the next. To this end we 
introduce the following residual norm 




<•. 7=1 


7dA* 


(30) 
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where and n* are the number of cells on grid kinx and y, respectively, and is the 
average cell area on level k. Then on the finest grid m convergence is taken as 

L.R"' <10-^. (31) 

In most cases this produces at least four significant figures in the solution. For intermediate grids 
k > 1 in the FAS-FMG process, convergence before interpolating to the next finer grid is taken 
as 

Z^^*<10-", (32) 

with a maximum of 50 sweeps on level k, and for the coarsest grid k = 1 ‘solution’ is given by 

L.R' <L,R'^ /\0, (33) 

where now L^R'' is the most recent error on the current finest grid and a maximum of four 
sweeps is taken on the coarsest grid. 

8. RESULTS AND DISCUSSION 

Four numerical test problems have been chosen to evaluate the efficiency and robustness of 
the multigrid method with each of the three smoothers. In three of the problems, existing 
numerical results are used to check the accuracy of the current approach. In two of the problems, 
large temperature variations are used to test the ability of the solvers in this area. For each 
problem the behavior of the solvers is tested over a range of Reynolds numbers Re on uniform 
grids. In addition, although uniform grids are adequate to resolve the physics in each case, the 
solvers are also tested over a range of stretched grids for each problem. Finally, since each of 
these cases is in the incompressible range, the reference Mach number Ma is set to 10”^ for all 
computations. 

Note all computations were performed on an SGI Octane workstation with an R12000 (300 
mhz) CPU. 

8.1. Lid-Driven Cavity 

The lid-driven cavity is a common benchmark for recirculating flows. The second-order 
streamfunction-vorticity results of Ghia et al. [20], computed on a uniform 256 by 256 grid, are 
generally accepted as standard. Isothermal flow is set up in a square cavity with a top lid that 
moves to the right at constant speed m = 1 . In the current work, for this flow, the residual R* is 
evaluated with third order upwind in both directions on all grids. Streamfunction contours, 
computed on a uniform 128 by 128 grid, for Re = 1000 and 5000, are shown in Figure 1. Profiles 
of M on the vertical centerline, computed on uniform 128 by 128 and 256 by 256 grids, for Re = 
100, 1000 and 5000, are compared with the standard results in Figure 2. The agreement on both 
grids for all three Reynolds numbers is very good. 
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Re = 1000 


Re = 5 



X X 

FIG. 1. Streamfunction contours for lid-driven cavity at Re = 1000 and 5000. 


128 by 128 256 by 256 



FIG. 2. Profiles of u on vertical centerline for lid-driven cavity on two uniform grids at three Reynolds numbers. 
Curves are present results; symbols are from Ghia et al [20] on a 256 by 256 grid. 

The first set of performance results is for a uniform 128 by 128 grid with Re varying from 
100 to 5000. Six grids are used in the full coarsening multigrid process with computation starting 
on grid 2. Computations are performed with fil = 0.4, RCV = 2.2 and CFL = 40.0 for each 

method on all grids. Convergence plots of vs. work units are shown in Figure 3 for all three 
smoothers. One work unit is defined as the cpu time required to evaluate the residual of the 
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discrete equations on the fine grid, 0.088 seconds on the SGI for this case. Note that in the 
residual evaluation, which is the same for all three methods, the flux across each cell boundary is 
computed only once. In this figure each symbol on a plot corresponds to a single fine-grid sweep 
and the horizontal offset from the origin is the initialization time on the coarser grids. For these 
computations all three smoothers are comparable with the faster convergence of LUSGS and 
BILU compensated for by the shorter cpu time per sweep for LEGS. 


LBGS 


LUSGS 


BILU 



FIG. 3. Lid-driven cavity: convergence plots for each method on a uniform 128 by 128 grid. 


The second set of results is for a stretched 128 by 128 grid at Re = 1000 with the grid 
spacing at all four walls varying from ds^ = 0.002 to 0.00002. One-dimensional grid stretching 
in each direction is performed by a procedure that permits specification of the grid spacing at 
each end of the interval. This is described in Appendix B. Six grids are used in the multigrid 
process with computation starting on grid 2. The values of , RCV and CFL are the same as in 

the previous set. Convergence plots of vs. work units are shown in Figure 4 for all three 
smoothers. For the stretched grid LBGS and BILU are again comparable. The performance of 
LUSGS, however, rapidly degrades as the stretching increases. 
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LBGS 


LUSGS 


BiLU 



FIG. 4. Lid-driven cavity: convergence plots for each method on a stretched 128 by 128 grid for Re — 1000. 


8.2. Buoyancy-Driven Cavity 

This is a flow in a square cavity with insulated horizontal walls, vertical walls with constant 
temperatures on the left and on the right, and gravity force in the negative y-direction. We 

consider the particular case of = 4 and varying Rayleigh number Ra , where 

Ra = AT plgCp^lI^ j with AT = 2{T^ -T^ )/(r^ ) . Yu et al. [21] have reported detailed 

results for this problem, which were obtained with a least-squares finite-element method. The 
computations were performed on a 129 by 129 moderately stretched grid with bi-quadratic 
elements. In order to compare with these results we use constant viscosity p and thermal 

conductivity K and set Fr = 1 .2 , Pr = 0.7 . In the present work the residual P* is again 
evaluated with third order upwind in both directions on all grids. Streamfunction contours, 
computed on a uniform 128 by 128 grid, for Ra = 10^ and 10®, are shown in Figure 5. Profiles of 
u on the vertical centerline, computed on uniform 128 by 128 and 256 by 256 grids, for 
Pa = 10\ 10® and 10® , are compared with those of Yu et al. [21] in Figure 6. The agreement on 
both grids at Ra = 10® and 10® is very good, both with each other and with the reference, but 
those at 10®, while agreeing with each other, disagree substantially with those of the reference. 

In [21] the streamfunction plot for this case shows a somewhat different structure with two 
additional secondary vortices. The present calculations cast this finding into doubt. Note our 
computations ona512by512 grid agree with our results on the two smaller grids. 
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FIG. 5. Streamfunction contours for 


buoyancy-driven cavity at Ra = 10^ and 10^. 


128 by 128 256 by 256 




FIG. 6. Profiles of u on vertical centerline for buoyancy-driven cavity on two uniform grids at three Rayleigh 
numbers. Curves are present results; symbols are from Yu et al. [21] on a 129 by 129 grid. See text for discussion of 

disagreement at Ra= 1 0®. 

The first set of performance results for this flow is for a uniform 128 by 128 grid with Ra 
varying from 10^ to lO’ . Six grids are used in the full coarsening multigrid process with 
computation starting on grid 1. Computations are performed with Pi = 0.3, RCV = 0.8, and 
CFL = 40.0 for each method on all grids once the finest grid is reached. Lower values of CFL are 
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used during the starting process. Convergence plots of vs. work units are shown in Figure 7 
for all three smoothers. For this case one work unit is 0.089 seconds. In this case all three show 
similar behavior with convergence degrading somewhat at Ra = \0^ . 


LBGS LUSGS BILU 



FIG. 7. Buoyancy-driven cavity: convergence plots for each method on a uniform 128 by 128 grid. 

The second set of results is for a stretched 128 by 128 grid at = 10^ . In this case the 
stretching is only in the x-direction with the grid spacing at both hot and cold walls varying from 
ds^ = 0.002 to 0.00002. The grid in the y-direction is kept uniform. Six grids are used in the 

multigrid process with computation starting on grid 1. The values of Pi , RCV and CFL are the 
same as in the previous set. Convergence plots of L^R^ vs. work units are shown in Figure 8 for 

all three smoothers. As with the driven cavity, LBGS and BELU show comparable performance, 
while LUSGS rapidly degrades with increasing grid stretching. 
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FIG. 8. Buoyancy-driven cavity: convergence plots for each method on a stretched 128 by 128 grid at Ra = 10^ . 
8.3. Backward-Facing Step 

We consider isothermal flow over a backward-facing step with a step height of half the 
downstream channel height and a parabolic inlet velocity « ( y ) = 24y (0.5 -y) for 0<y< 0.5 

specified at the step. The outflow boundary is taken at 15 channel heights downstream of the 
step. Gartling [22] set up this problem as a test for outflow boundary conditions. Numerical 
results were obtained for Re = 800 with the outflow boundary at 30 channel heights, and used a 
Galerkin-based finite element method with 800 by 40 biquadratic velocity elements and linear 
discontinuous pressure elements. For this flow the residual /?* is evaluated with third order 
upwind in the y-direction on all grids while in the x-direction we use first order upwind with 
defect correction to third order on the fine grid only. Streamfunction contours, computed on a 
uniform 480 by 64 grid, for Re = 400 and 800, are shown in Figure 9. Note the contour levels 
are the same as those used by Gartling [22]. Profiles of u computed on uniform 480 by 64 and 
960 by 128 grids at x = 7 and 15 channel heights for Re = 800 are compared with Gartling’ s [22] 
results in Figure 10. The agreement at both locations for both grids is excellent. Note that x = 15 
channel heights is the exit boundary for the current computations and the value of u is obtained 
by linear extrapolation from the interior. 


17 



FIG. 9. Streamfiinction contours for backward-facing step at Re = 400 and 800. 


480 by 64 960 by 128 




FIG. 10. Profiles of M at J. = 7 and 15 for backward-facing step on two uniform grids at Re = 800. Curves are 

present results; symbols are from Gartling [22]. 

The first set of performance results is for a uniform 480 by 64 grid with Re varying from 
200 to 800. These calculations use full coarsening with fine-grid defect correction in the x- 
direction. Defect corrections are performed every 2 to 6 fine-grid sweeps. Five grids are used in 
the multigrid process with computation starting on grid two. Computations are performed with 
Pi = 0.8, RCV = 0 (no viscous correction to P^) and CFL = 80.0 for each method on all grids. 

Convergence plots of L^R^ vs. work units are shown in Figure 1 1 for all three smoothers. Note 
the zig-zag behavior of the plots is a result of an increase in L^R^ each time a new fine-grid 
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defect correction is performed. Here one work unit is 0.17 seconds. All three methods show 
similar behavior for this case. It is also obvious, however, that the performance of all three 
methods suffers serious deterioration as Re 5 molds number is increased to 600 and 800. For Re = 
600 or 800, the deterioration with increasing grid stretching is even worse. 
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FIG. 11. Backward-facing step: convergence plots for each method on a uniform 480 by 64 grid with full 

coarsening and defect correction in X . 

In order to obtain a more robust method we introduce semi-coarsening and retain fine-grid 
defect correction, both in x . Again we use a uniform 480 by 64 grid with five grid levels and 
start the computation on grid two. For these computations, defect corrections were performed 
every two fine grid sweeps. Computations are performed with = 0.8 for LBGS and BILU, 

however we set 1.8 for LUSGS. Again we set RCV = 0 and CFL = 80.0 for each method 
on all grids. Convergence plots of vs. work units for these methods are shown in Figure 12. 
With semi-coarsening, LBGS and BILU show similar performance, but LUSGS take about twice 
as long to converge. Note that on a finer grid, 960 by 128, the relative performance of LUSGS 
with semi-coarsening is even worse. The deterioration of performance with Reynolds number, 
however, is now quite modest. Of course, as expected, the total number of work units is larger. 
Note the zig-zag behavior, when it occurs, is much less pronounced as the fine-grid defect 
correction is now much smaller. 
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FIG. 12. Backward-facing step: convergence plots for each method on a uniform 480 by 64 grid with semi- 
coarsening and defect correction in X . 

The next set of results is for a stretched 480 by 64 grid at Re = 800. Stretching is only in the 
y-direction with the grid spacing at the bottom and top walls varying from ds^ = 0.004 to 

0.00004 while the spacing at the center (top of the step) is held fixed at ds^ = 0.008. Spacing in 
the x-direction is kept uniform. As above we use semi-coarsening with defect corrections, both in 
X, every two fine-grid sweeps. Values of J3q , RCV and CFL are the same as in the previous set. 

Convergence plots of vs. work units for these methods are shown in Figure 13,. Again 
LBGS and BELU are similar, and LUSGS is much worse. What is most interesting is the 
insensitivity of these performance results to grid stretching. This is certainly desirable in a robust 
method. 
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FIG. 13. Backward-facing step: convergence plots for each method on a stretched 480 by 64 grid with semi- 
coarsening and defect correction in x at Re = 800. 

8. ■4. Backward-Facing Step with Heat transfer 

As a final test case, one with a large temperature variation, we consider the backward-facing 
step with heat transfer. The temperature of the lower channel wall in the previous flow is raised 
to twice the inlet temperature and an adiabatic wall condition is imposed on the upper channel 
wall and the step face. Again viscosity ju and thermal conductivity at are held constant. For this 

flow as before the residual /?* is evaluated with third order upwind in the y-direction on all grids 
while in the x-direction we use first order upwind with defect correction to third order on the fine 
grid only. Streamfunction contours, computed on a uniform 480 by 64 grid, for Re = 400 and 
800, are shown in Figure 14. Note that at Re = 800 the lower recirculating zone is larger and the 
upper recirculating zone is no longer present. 
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FIG. 14. Streamfunction contours for backward-facing step with heat transfer at Re — 400 and 800. 


The first set of performance results is for a uniform 480 by 64 grid with Re varying from 
200 to 800. These calculations use full coarsening with fine-grid defect correction in the ;c- 
direction every 2 fine- grid sweeps. Five grids are used in the multigrid process with computation 
starting on grid two. Computations are performed with = 0.6, RCV = 0 and CFL = 80.0 for 

each method on all grids. Convergence plots of L^R^ vs. work units are shown in Figure 15 for 
all three smoothers. Here one work unit is 0.17 seconds. In this case all methods show 
comparable performance with negligible deterioration with Reynolds number. 

LBGS LUSGS BILU 



FIG. IS. Backward-facing step with heat transfer; convergence plots for each method on a uniform 480 by 64 grid 
. with full coarsening and defect correction in x. 
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Next we apply the same methods on a stretched 480 by 64 grid at Re = 800. Stretching is 
only in the y-direction with grid spacing at the bottom and top walls varying from ds^ — 0.004 to 

0.00004 while grid spacing at the center is held fixed at ds^ - 0.008. Values of , RCV and 
CFL are the same as in the previous set, except that is now 1.0 for LUSGS. Convergence 

plots of vs. work units are shown in Figure 16 for all three smoothers. Again all methods 
show comparable performance with modest degradation with increasing grid stretching except 
that LUSGS has greater difficulty at the finest spacing. 
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FIG. 16. Backward-facing step with heat transfer: convergence plots for each method on a stretched 480 by 64 grid 

with full coarsening and defect correction in x at Re — 800. 

Although semi-coarsening does not appear to be necessary for this problem, it is 
nevertheless instructive to examine its performance in this case. We return to the uniform 480 by 
64 grid with five grid levels and start the computation on grid two. Computations are performed 

with Pi = 0.6 for LBGS Jind BILU, however, now we set fil = 3.0 for LUSGS. Again we set 

RCV = 0 and CFL = 80.0 for each method on all grids. Convergence plots of L^R^ vs. work 
units are shown in Figure 17 for all three smoothers. Here LBGS and BILU give comparable 
performance while LUSGS is much worse. 
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FIG. 17. Backward-facing step with heat transfer; convergence plots for each method on a uniform 480 by 64 grid 

with semi-coarsening and defect correction in x. 

Finally, we consider stretched grid performance with semi-coarsening and defect correction 
in X. Values of , RCV and CFL are the same as in the previous set. Convergence plots of 

vs. work units are shown in Figure 18 for all three smoothers. Again LBGS and BILU give 
comparable performance while LUSGS is much worse with convergence almost ceasing at the 
two finest wall grid spacings. 
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FIG. 18. Backward-facing step with heat transfer; convergence plots for each method on a stretched 480 by 64 grid 

with semi-coarsening and defect correction in x at Re = 800. 
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9. CONCLUSIONS 


The performance of multigrid solvers for the compressible Navier-Stokes equations at low 
speeds and large temperature variations has been investigated. Each solver employed time- 
derivative preconditioning together with preconditioned flux-differencing splitting of the inviscid 
terms. Three implicit smoothers were incorporated in a common multigrid procedure and the 
resulting methods were tested on four 2D laminar problems over a range of Reynolds numbers 
with both uniform and highly stretched grids. 

For predominantly recirculating flows on uniform grids, the performance of all three 
methods is comparable and really quite good although a moderate slowing of convergence is 
seen with increasing Reynolds number. On highly stretched grids, however, the convergence rate 
of LUSGS deteriorates quite rapidly while the performance of LEGS and BILU remains 
insensitive to the stretching. It should be noted that in the case of the buoyancy-driven cavity 
flow none of the methods has any difficulty with the four to one ratio of wall temperatures. 

For cases with a predominant flow direction, it appears that a robust method can be 
constructed on the basis of semi-coarsening and fine grid defect correction, both in the main flow 
direction. Such methods cost more per multigrid cycle but tend to converge more reliably and in 
fewer cycles. With semi-coarsening LEGS and EILU show comparable good performance on 
both uniform and highly stretched grids again with moderate slowing of convergence at the 
higher Reynolds numbers. The performance of LUSGS with semi-coarsening, on the other hand, 
degrades substantially on uniform grids and may get even worse on highly stretched grids. In the 
case of the backward-facing step with heat transfer, we again note that none of the methods has 
any difficulty with the two to one ratio of bottom wall to inlet temperatures. 

From the present studies it is evident that an efficient robust multigrid solver can be based 
on either LEGS or EBLU with either full or semi-coarsening, depending on the problem. It 
should be noted that EILU tends to be the more robust of the two showing a net decrease in cpu 
time at the more difficult higher Reynolds number cases albeit at a cost of a significant increase 
in global storage. Finally, we also note the benefit to be obtained from the use of directional 
defect correction in the case of a predominant flow direction. 

APPENDIX A 

The block matrix coefficients in Eq.(24) are obtained by expansion of Eq.(19). For an 
interior cell these are given as follows; 
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A.y =-[aVv2J \-\l2^dyj, 

Aj = i,i-\/ 2 dy~^ j-\i 2 ] dx ^ , 

C^j = +[5~i.j+V2 -^”’i,j+V2<iy~S+i/2]^P 

=+|^A i+\i2j ~ R'"i+]j2,jdx 'i+i/ijdyy, (34) ■ 

Bj j = +|^A^i+l/2,y — A i-l/2,y + /?”'i+l/2,y<ic ^1+^2 + R’°‘i-l/2jdx \-l/2jti)'y 
+|^fi^'.y+i/2 — B (,j-i/2 + R^i,j+i/2dy ^j+]j2 + R^ i.j-i/2dy~^ j-\/2~^dXi 
+{Bq ~^ ). . dx^yj/dt - D^jdx^dyj. 

When the cell (/, j) is located adjacent to a boundary, as noted in Section 4, the inviscid 
surface flux is obtained directly from Eq.(3) and the viscous flux is approximated by a one-sided 
second-order difference. As a result the above matrix coefficients must be modified. As an 
example, let the cell face (i -1/2, y) lie on the left boundary. Then in Eq.(34) the coefficients 

j and E^J are unchanged, D,. is set to zero, and B, ^ is changed as follows 

B^ j = -|-j^A^i+l/2,y -|- /?**i+l/2,y'<ix *i+l/2 + 3/?**i-I/2,y<ir \-l/2j</yy 
- [ A-i/ 2 ,y + 3/?"/-V2.yd[x'',-V2 1 dy 

(35) 

+ [^5'^/,y+l/2 —B~iJ-\/2 + R^iJ+\/2dy~^ j+1/2 + /?^i,>-l/2dy’'y-I/2 J 
-I- ( Pg"' ^ dx.dyj /dt - D^jdXidyj . 

APPENDIX B 

We have employed a mesh clustering procedure that permits specification of the grid 
spacing at both ends of the interval. Let Sj,j = 0 to n , denote the grid points along a line with 

the locations 5'o,5j,5„_i,5„ specified. Also define dsj =(sJ^^-SJ_^)/2. Then we take 


-d] = -SJ_^ + 2s j - - PjdSj = 0 for y = 1 • • • n , 




5^-5, =0 

-P;_,+2p^.-p^.^l=0 

■yy-Vi=o 


fory = l, 

for y = 2- n-l, 

for y = n — 1. 


(36) 


Eqs.(36) are linearized about the current state (2* where 
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Then 


Q) = ’ Pj ^ ^Qj = QT ~ Qj ^Qo = ^Qn 


-AjAQj_,+BjAQj-C,AQj^,=Dj, 


( 37 ) 


with 


^y 


^y 


^y 


^y 


'l-Pj/2 0 
0 0 
'l-Pj/2 0 

0 1 

'l-Pj/2 0 

0 0 


d' 

j 

. I . 



"2 

-dSj' 

’C, = 

'1 + Pjl2 

o' 

’ J 

1 

0 

’ J 

0 

0_ 


'2 

-dSj' 


1 + Pjl2 

o' 

’ J 

0 

2 

’ J 

0 

1 

= 

'2 

-dSj 


l + Pj/2 

o' 

’ J 

_1 

0 

’ J 

0 

0_ 


for 7 = 1, 

for 7 = 2---n-2, 

for j = n-\. 


(38) 


Qj is initialized by distributing Sj uniformly between Sq and and setting pj = 0 for all 

j. Then Eq.(37) is solved for AQj by block-tridiagonal inversion and Qj is updated. The 

process continues until the norm of Apj is reduced to less than 10"^, typically 5 to 15 

iterations. For a highly stretched case, the resulting grid spacing increases rapidly but smoothly 
from the beginning of the interval and then levels off to become nearly constant over the center 
of the interval. 
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